10 research outputs found
An Epipolar Line from a Single Pixel
Computing the epipolar geometry from feature points between cameras with very
different viewpoints is often error prone, as an object's appearance can vary
greatly between images. For such cases, it has been shown that using motion
extracted from video can achieve much better results than using a static image.
This paper extends these earlier works based on the scene dynamics. In this
paper we propose a new method to compute the epipolar geometry from a video
stream, by exploiting the following observation: For a pixel p in Image A, all
pixels corresponding to p in Image B are on the same epipolar line.
Equivalently, the image of the line going through camera A's center and p is an
epipolar line in B. Therefore, when cameras A and B are synchronized, the
momentary images of two objects projecting to the same pixel, p, in camera A at
times t1 and t2, lie on an epipolar line in camera B. Based on this observation
we achieve fast and precise computation of epipolar lines. Calibrating cameras
based on our method of finding epipolar lines is much faster and more robust
than previous methods.Comment: WACV 201
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement
Isolating the voice of a specific person while filtering out other voices or
background noises is challenging when video is shot in noisy environments. We
propose audio-visual methods to isolate the voice of a single speaker and
eliminate unrelated sounds. First, face motions captured in the video are used
to estimate the speaker's voice, by passing the silent video frames through a
video-to-speech neural network-based model. Then the speech predictions are
applied as a filter on the noisy input audio. This approach avoids using
mixtures of sounds in the learning process, as the number of such possible
mixtures is huge, and would inevitably bias the trained model. We evaluate our
method on two audio-visual datasets, GRID and TCD-TIMIT, and show that our
method attains significant SDR and PESQ improvements over the raw
video-to-speech predictions, and a well-known audio-only method.Comment: Supplementary video: https://www.youtube.com/watch?v=qmsyj7vAzo